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CONVERGENCE RATES FOR POSTERIOR DISTRIBUTIONS AND 

ADAPTIVE ESTIMATION 

By Tzee-Ming Huang 
Iowa State University 

The goal of this paper is to provide theorems on convergence rates 
of posterior distributions that can be applied to obtain good conver- 
gence rates in the context of density estimation as well as regression. 
We show how to choose priors so that the posterior distributions con- 
verge at the optimal rate without prior knowledge of the degree of 
smoothness of the density function or the regression function to be 
estimated. 

1. Introduction. Bayesian methods have been used for nonparametric in- 
ference problems, and many theoretical results have been developed to inves- 
tigate the asymptotic properties of nonparametric Bayesian methods. So far, 
the positive results are on consistency and convergence rates. For example, 
Doob (1949) proved the consistency of posterior distributions with respect to 
the joint distribution of the data and the prior under some weak conditions, 
and Schwartz (1965) extended Doob's result to Bayes decision procedures 
with possibly nonconvex loss functions. For the frequentist version of consis- 
tency, see Diaconis and Freedman (1986) for a review on consistency results 
on tail-free and Dirichlet priors. Barron, Schervish and Wasserman (1999) 
gave some conditions to achieve the frequentist version of consistency in gen- 
eral. Ghosal, Ghosh and Ramamoorthi (1999) also gave a similar consistency 
result and applied it to Dirichlet mixtures. 

For convergence rates, there are some general results by Ghosal, Ghosh 
and van der Vaart (2000) and Shen and Wasserman (2001). However, there 
are few results on adaptive estimation in the study of posterior convergence 
rates. Belitser and Ghosal (2003) dealt with adaptive estimation in the in- 
finite normal mean set-up. In this paper, we also have results on adaptive 
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estimation, but these are done in the density estimation and regression se- 
tups. 

The goal of this paper is to develop theorems on convergence rates for 
posterior distributions which can be used for adaptive estimation. In this 
paper we have theorems on convergence rates in two contexts: density esti- 
mation and regression. In either case, we consider the Bayesian estimation 
of some function / (a density function or a regression function) based on 
a sample (Z±, . . . , Z n ) and are interested in the convergence rates for the 
posterior distributions for /. 

Below is the specific problem setup. Suppose that when / is given, [Z\ ,Z n 
is a random sample from a distribution with density pf with respect to 
a measure ^ on a sample space (5,£>), f is the true value for /, and f 
belongs to some function space T . Suppose that n is a prior on T and 
Bd(s n ) = {f G F '■ d(f, f ) < s n } is an s n neighborhood of f Q with respect to 
the metric d, where d is the Hellinger distance in the density estimation case 
and is the L2 distance in the regression case. 

We would like to show that the posterior probability 

(1) HB^nz,, ...,z n ) = JB f^ 11 1 / \J " 

converges to zero in probability, and the rate s n is as good as if the degree 
of smoothness of f a were known. This is known as the adaptive estimation 
problem. 

For the purpose of adaptive estimation, we take T to be UjeJ-^i where 
J is a countable index set (not necessarily a set of integers) and the J^'s 
are function spaces of different degrees of smoothness. A natural way to 
construct priors on T is to consider sieve priors. A sieve prior is a prior 7f 
of the following form: 

where dj > 0, J2j£j a j = 1> an d each nj is a prior defined on T but supported 
on Tj. To make it easier to specify the vrj's, we assume that each Tj is finite- 
dimensional and can be represented as {fej'-O £ ®j} for some parameter 
space Qj. We also assume that each TTj is induced by a prior ttj defined on 
@j. Then the posterior probability in (1) can be written as U n /V n , where 



and 
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where B dJ (s n ) = {9 £ Qj : d(f e ,j, f ) < s n }. 

This paper is organized as follows. Section 2 gives a theorem on conver- 
gence rates in the density estimation case and some examples of applying 
the theorem to obtain adaptive rates. Section 3 contains the same things as 
in Section 2, but in the context of regression. Proofs are in Section 4. 



2. Density estimation. 



2.1. Theorem. This section gives a convergence rate theorem for Bayesian 
density estimation. The setup is as described in Section 1, with Pf = f and 
d being the Hellinger metric dn, which is defined by 



dH(f,g) = Jj(^f-^g) 2 df,. 



To make the posterior probability U n /V n — > 0, we need some conditions to 
give bounds for U n and V n . 

To bound U n , we will make an assumption about the structure of each 
parameter space Qj, and then specify the aj accordingly. Let || • ||oo denote 
the sup norm 

S <w0?> r) = {9e Qj : d n (f v ,j, fe,j) < r) 

and N(B,5,d') denote the 5-covering number of a set B with respect to 
a metric d' , which is defined as the smallest number of <5-balls (with respect 
to d') that are needed to cover the set B. Here is the assumption. 

Assumption 1. For each j S J, there exist constants Aj and m,- such 
that Aj > 0.0056, mj > 1, and for any r > 0, 5 < 0.0056r, 9 £ Qj, 



N(B dHj (6,r),5,d jyO0 )< 



A jr 



where dj jOO (0,r]) is defined as || log fgj — logf v j\\oo for all 9, r\ £ Qj. 

Suppose Assumption 1 holds. We specify the Oj's in the following way: 
(2) aj = a exp (- (l + Vj 



where a is a normalizing constant so that ^ • aj = 1, 7 = 0.1975 is the solu- 
tion to 0.137/^1-47 = 0.0056, and 

Arrij /46.2A,yi - 47 \ 8Cj 



for some Cj such that Cj > and J2j e ~ C] ^ !• 
Note: 
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1. Assumption 1 is based on Assumption 1 in Yang and Barron (1998) so 
that their results can be applied here. The constants Aj and rrij can be 
figured out based on the local structure of Qj. In many cases, rrij can be 
taken as the dimension of Qj, as stated in Lemma 2. 

2. The constants Cy's are here to make sure that X)j a j < 00 since aj < 
ae~ Cj . Indeed, we may take r\j to be some large constant times rrij log A^, 
if this choice makes {aj} summable. Also, specific constant values are 
given in (2) and (3) for calculational convenience. Different choices are 
possible. 

To find a bound for V n , we will use Lemma 1 of Shen and Wasserman 
(2001), which says we can bound V n from below if the prior puts enough 
probability on a small neighborhood of the true density f Q . To guarantee 
enough prior probability around f Q , we proceed as follows. 

1. Find a model Jj n that receives enough weight a Jn and is close to f , that 
is, there exists j3 n in Qj n so that fp n> j n is close to f Q . 

2. Make sure the prior irj n puts enough probability on a neighborhood of (3 n . 
This helps tt put some probability around f Q since a Jn is not too small. 

For the first step, we simply assume that it is possible. 
Assumption 2. There exist j n and (3 n <E Qj n such that 

(4) ™^(D(fo\\fp n , jn lV(fo\\fp n ,j n )) + < 4 

for some sequence {e n }, where D(f\\g) = J /log(//g) dfi, V(f\\g) = J f (log(f / g)) 
rjj n is as defined in (3) with Aj n and m, n in Assumption 1. 

Before going to assumptions for the second step, we add one more condi- 
tion here to allow us to use neighborhoods that are different but comparable 
to the neighborhoods in Lemma 1 of Shen and Wasserman (2001). 

Assumption 3. For the j n in Assumption 2, there exists a metric dj n 
on @j n such that 

(5) / fo (log 2 dii < K' Q d) n ( V , 6) 
for all rj, 9 in G Jn , and 

D(f \\fe, Jn )<KV(f \\f e , Jn ) 
for all £ @j n , where K' and Kq are constants independent of n. 

The following two assumptions are for the second step. 



BAYESIAN ADAPTIVE ESTIMATION 



5 



Assumption 4. For j n , Aj n , irij n , (5 n , e n and dj n in Assumptions 1-3, 
there exists b± > such that 

N(@ jn ,e n ,d jn )<(Af n K,) m ^, 

where N(Qj n ,e n ,dj n ) is the e n -covering number of Jn with respect to the 
metric dj n . 

Assumption 5. For j n , Aj n , mj n , j3 n , e n and dj n in Assumptions 1-3, 
there exist constants K§ and 62 > such that for any 9\ S Qj n , 

^(^ nJ „(/?n,en))- 1 J " 5J • 



Note: 



1. Assumption 4 is here to give more control of the overall size of @j n in 
terms of the e n -covering number (Assumption 1 essentially deals with the 
local structure) . This control is to prevent the total prior probability from 
getting spread out so much that each neighborhood gets little probability. 

2. Assumption 5 is to make sure that the prior supported on Qj n puts 
enough probability near (3 n compared to some other neighborhood. 

Finally, we assume the following. 



Assumption 6. As n —* 00, 

e n — > and ne^^oo. 

Now we have the following theorem. 



Theorem 1. Suppose that Assumptions 1-6 hold. Then with aj defined 
in (2), there exist positive constants c, K\ and K2 that are independent of 
n, so that 

(6) 7f(B dH (lirie n ) c |Xi, ...,X n )< cexp(-K 2 nel) 

except on a set of probability converging to zero. 



The proof of Theorem 1 is given in Section 4. 



2.2. Example: spline basis. In this section, we assume that log/ D is in 
the Sobolev space W£,[0, 1] = {g: ||£ )S 5'||l oo [0,i] < 00}, where s is a positive 
integer and || • H^ro u is the essential sup norm with respect to the Lebesgue 
measure on [0,1]. We will see that using the sieve prior given below, the 
posterior distribution converges at the rate n -s /( 1+2s ) in Hellinger distance. 



G 



T.-M. HUANG 



Lemma 1. Suppose that log/ Q G W^[0, 1] as defined above and fi is 
the Lebesgue measure on [0,1]. Let J = {(k,q,L) :k,q and L are integers 
k > 0,q > 1, and L > 1}. For j = (k,q,L) S J, let nij = k + g, and /or 
i G {1, . . . ,mj}, let B~i be the normalized B-spline associated with the knots 
yi, . . . ,Vi+ q as in Definition 4.19, page 124 in Schumaker (1981), where 

(yi> • • ■ iVqiVq+li ■ ■ ■ ,y q +k,y q +k+l, ■ ■ -,y2q+k) 

= (jv^o, i/(i + k),..., k/(i + k), 

q times q times 

Define 

Qj = {9e R m > : 9't mj = 0, \\D r logfeJ Laam < L,Vr 6 {0, l,...,q— 1}}, 

where l m . = (1, .... 1)' € R m i , log = -^{9)+9'B, if>(9) = log e 9 '^) dx 
is the normalizing constant, and B = (Bj t i, . . . ,Bj_ mj ). Define rjj as in (3) 
with 

(7) Aj = 19.28^(20 + 1)9"~ 1 (L + l)e L/2 + 0.06 and Cj = mj+L; 

define aj as in (2). Lei 7Tj &e the Lebesgue measure on Qj. Let ttj be the in- 
duced prior ofiTj and Bd H (s n ) denote the s n Bellinger neighborhood of f Q , as 
defined on page 3 of Schumaker (1981 ). Then for the prior tt = Y^,j a j^j > the 
posterior probability Tr(Bd H (s n ) c \Xi, . . . ,X n ) converges to zero in probability 
for some s n oc n~ s ^ l+2s \ 

The proof of Lemma 1 is given in Section 4. 
Note: 

1. Log-spline models have been used in density estimation and give good 
convergence rates; see Stone (1990), for example. 

2. The prior does not depend on s, but it adapts to the smoothness param- 
eter s. 

3. Here we take ttj to be the Lebesgue measure on Qj, but we may also 
take TTj to be some measure that has a density qj with respect to the 
Lebesgue measure on Qj. As long as || logajHoo is uniformly bounded in 
j, the convergence rates should be the same. 

4. Cj = nij + L is just one possible choice. In general, if we choose {Cj} so 
that J2j e j < oo and Cj n — ► oo no faster than m Jn log A^, where j n is 
as in Assumption 2, then it should be a good choice. 

5. To figure out Aj and wtj, the following lemma, from Lemma 1 by Yang 
and Barron (1998), is useful. 

Lemma 2. Suppose that {Si'.l 6 A} is a countable collection of linear 
function spaces on [0, 1] . Suppose that for each Si there is a basis {Bn, . . . ,B[ mi }. 
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Suppose that there exist constants T\ and T2 such that for 9 = 
R m <, 

mi 



'!)•••) "mi, 



(8) 

and 

(9) 



E 



i=l 



mi 

E 9iBl,i 

1=1 



< T\ max I Oi \ 



> 



T-2 



mi 

E^ 2 > 
i=i 



z=i 1 * • \ i=i 

where || • H2 denotes the L2 norm with respect to the Lebesgue measure on 
[0,1]. Let 

m 1 

(10) log/ ej = -^(0) + ;>>£ M , 

i=l 

where ip(0) = log/g 1 exp(^™' 1 9iBi^(x)) dx is the normalizing constant. Sup- 
pose that 1 & Si for all I £ A, J = {(I, L) : I € A, L is a positive integer} and 
for j £ J, 

9; C {# G# m <: || log .felloe <L}. 
Then Assumption 1 holds with 

T 

(11) A,- = 19.28^(1, + l)e L/2 + 0.06 and mj=mj. 

2.3. Example: Haar basis. In this section, we assume that log f is a con- 
tinuous function on [0,1] with ||log/ ||oo < Mo, and we approximate log/ D 
using the Haar basis {l^ i](x),ipj lt k 1 (x):0 < ji,0 < k\ < 2 n — 1}, where 

ip jlM (x) = 2^/ 2 r(2 jl x - h) and ijj*(x) = l [0 ,o. 5 ](^) " 1 [0.5,1] We also 
assume that the coefficients of the L2 expansion of log/ Q for the Haar basis, 
denoted by 4n,fci> satisfy the following condition: 



(12) 



E (2 J1+1 " 1) 



ji>0 



2^1-1 

* E 

fci=0 



^ 2 

u jl,kl 



<H 2 



for some ffo > and a G (0,1). According to Barron, Birge and Massart 
[(1999), page 330], the above condition on the Haar basis coefficients cor- 
responds to the Besov space .62 2 [0, 1]. The Besov space -B-f 2 [0, 1] is indeed 
the Sobolev space W^O, 1], so the optimal convergence rate is n - a /( 1+2Q ) in 
L2-distance. We will see that using the sieve prior given below, the posterior 
distribution converges at the rate n~ a /( 1+2a ) (logn) 1 / 2 in Hellinger distance, 
which is close to the optimal rate n~ a ^ 1+2a ^ within a (logn) 1 / 2 factor: 
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Lemma 3. Suppose that log/ Q is in the space specified above and fi is the 
Lebesgue measure on [0, 1] . Let J = {(l,L) : I and L are integers. I > 0, L > 
1}. For j = (l,L) G J, let rrij = 2 l+1 . Reindex the Haar basis in the following 
way: 

WjiM : ^ h < h < h < 2* - 1} = f {B jti : 1 < i < mj - 1}. 

Then for 6 € R m i~ l , define log/ flj - = -rp(9)+9'B, where ij}{9) = log j"^ e 6 '^ 
is i/ie normalizing constant and B = (-Bj,i, ■ • ■ , Bj^ mj ). Define 

e j = {0£R m ^ 1 :\\6'B\\ oo <L} 

and let ttj be the Lebesgue measure on Qj . Define aj and rjj according to (2) 
and (3) with 

(13) Aj = 19.28 -2 (m)/2 (2L+l)e L + 0.06 and C j = m j + L. 

Let -Kj be the Lebesgue measure on Qj . Let ttj be the induced prior of ttj 
and Bd H (s n ) denote the s n Bellinger neighborhood of f Q , as defined on 
page 3 in Schumaker (1981). Then for the prior tt = J2j a j^j > the poste- 
rior probability 7r(l?d H (s n ) c |Xi, . . . ,X n ) converges to zero in probability for 
some s n oc n _a /( 1+2a )(logn) 1 / 2 . 

The proof of Lemma 3 is given in Section 4. 
Note: 

1. For the choice of a,- and ttj, see the note for Lemma 1. 

2. To specify Aj and rrij, Lemma 2 is no longer applicable since T\ in (8) 
cannot be taken as a constant in this case. We use the following lemma 
[from Lemma 2 by Yang and Barron (1998)] instead. 

Lemma 4. Suppose that {Si : I G A} is a countable collection of linear 
function spaces on [0, 1] and that for each I there exists a constant K[ > 
such that for all hE Si, 

(14) INIoc<^IN| 2 . 

Suppose that each S[ is spanned by a bounded and linearly independent (un- 
der L2 norm) basis 1, B^i, . . . , -B/, m; • For 9 G R m ' , define log/gj = — tp(9) + 
where tp(9) = log^ 1 exp(^"=! 9iB lyi (x)) dx. Suppose that J = 
{(I, L) : I G A, L is a positive integer} and for each j G J, 

(15) Qj C {9 e R mt -.WlogfeJ^ <2L}. 
Then Assumption 1 holds with 

(16) Aj = l9.28Ki(2L + l)e L + 0.06 and rrij = m t + 1. 
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In the spline density estimation result, the convergence rate is optimal 
and we have full adaption. But the Haar basis result here is quite different. 
The convergence rate involves an extra log factor, which comes from the K\ 
in (16). In the spline case there is no K\ and Aj is approximately a constant 
when j = j n for large n (j n is the index for one of the best models at sample 
size n). In this case Aj is approximately proportional to the model dimension 
uij when j = j n because of the factor K\. 

3. Regression. 

3.1. Theorem. In this section, a Bayesian convergence rate theorem is 
given in the context of regression. The setup is as described in Section 1, 
with Zi = (Xi,Yi), where = f(Xj) + £j, Aj and are independent, X{ is 
distributed according to some probability measure fix and £% is normally 
distributed with mean zero and known variance a 2 . Thus the density pf 
(with respect to fix x Lebesgue measure on R ) is 

p f ( X> y) = _L_ e -0/-/(*)) 2 /(2- 2 ). 

V27T(7 



The metric d is the ^(Mx) metric. We also assume that ||/ ||oo is bounded 
by a known constant M. 

To bound U n and V n , we modify the assumptions in Theorem 1 in the 
following way. Let 

B L 2 (nx),j( r l,r) = {9e @j : \\f vd - fe,j\\ L2 ( Mx ) < r}. 
Assumption 1 is replaced with the following. 

Assumption 7. For each j, there exist constants Aj and rrij such that 
0<Aj< 0.0056, rrij > 1, and for any r > 0, 5 < 0.0056r, 9 G Gj, 

N(B L2{llx)j (0,r),S,d hOO )<^ 

Gl 

3 ■ 



where dj j00 (9,rj) = \\fej - f v ,j\\oo for all 9, r) G 



Also, suppose Assumption 7 holds: we specify the weights aj in the fol- 
lowing way to give an upper bound for U n : 

( ( 1 0.0056\ 

(17) aj = a exp 1—11 + ^— ^ H — 1^- 

where a is a normalizing constant so that Ylj a j = 1 an d 

4m ■ / 8 

( 18 ) = n a \ log(1072.5Aj) + C 3 max 1, — 

C\,M,a{ 1 ~ 47) V Ci,m, ct (1 -47) 

for some Cj such that Cj > and J2j e ~ Cj ^ 1- 

Assumption 2 is replaced with the following assumption. 
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Assumption 8. There exist j n and (3 n G Jn such that 

(19) max(L»(p /o ||^,J, V( Pfo \\ Pf ^J) + ^ < 4 

for some sequence {e n }, where is as defined in (18) with Aj n and m Jn in 
Assumption 7. 

Assumption 3 is replaced with the following. 

Assumption 9. 

(20) \\f e , ln - hjJ 2 L2(px) < K' <P jn (e, V ) for all 0,r? G 9 jn . 

Assumptions 4-6 remain unchanged except that "Assumptions 1-3" should 
be changed to "Assumptions 7-9." 
Now we have the following theorem. 

Theorem 2. Suppose that \\fe,j\\oo — M for all j and 6 9j. Suppose 
that Assumptions 7-9 and Assumptions 4-6 hold with the reference change 
made as mentioned above. Then with aj defined in (17), there exists a posi- 
tive constant K\ such that Tr(B L2 ^ /lx ^(Kie n ) c \Xi, . . . ,X n ) converges to zero 
in probability. Here B L2 ^ x ^{K\e n ) denotes the K\e n neighborhood of f with 
respect to the ^(/^x) metric, as defined on page 1557. 

The proof of Theorem 2 is given in Section 4. 

3.2. An example. In this section, we consider f G W£j0, 1] = {g : H-D^Hi^ 
oo} and approximate f a using a spline basis. The minimax rate for this space 
in L2 metric, according to Stone (1982), is n~ s ^ l+2s \ We will see that, us- 
ing the sieve prior given below, the posterior distribution converges at the 
optimal rate n~ s /( 1+2s ) in L2 distance. 

Lemma 5. Suppose that f Q G W£j0, 1], ||/ ||oo < M, where M is a known 
constant. Suppose that fix is the Lebesgue measure on [0, 1]. Let J = {(k, q, L) : 
k, q and L are integers; k > 0, q > 1,L > 1}. For j = (k, q, L) G J, let mj = 
k + q, and for i G {1, . . . , m.j], let Bj t i be the normalized B-spline associated 
with the knots yi, . . . , yi+ q , where 

(yii ■ ■ -,yq,yq+i, ■ ■ ■ ,yq+k,y q +k+i, ■ ■ -,y2 q +k) 

= (jV^O, 1/(1 + k),..., fc/(l + k),l^^V). 

q times q times 

Define 

Q j ={6eR m i:\\D r f e J Loom <L,Vre{0,l,...,q-l} and \\fejoo < M}, 
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where for 9 = . . . , 9 m . ) G R m i , 

rrij 

(21) j ILl Y^o,i! h ; ,l,f o'n. 

i=l 

Define ijj according to (18) with 

(22) Aj = 9.64^(2g + 1)9 9_1 + 0.06 and C j =m j + L, 

and define aj according to (17). Let ttj to be the Lebesgue measure on Qj. 
Let rtj be the induced prior of ttj and B^ 2 ^{s n ) denote the s n L^iji) neigh- 
borhood of f Q , as defined on page 1557. Then for the prior n = ajfcj, the 
posterior probability Tr(B L2 ^(s n ) c \Xi, . . . ,X n ) converges to zero in proba- 
bility for some s n oc n~ s ^ l+2s \ 



The proof for Lemma 5 is given in Section 4. 

Here is a lemma that is useful for verifying Assumption 7 to prove Lemma 5. 



Lemma 6. Suppose that {Sj\j G J} is a countable collection of linear 
function spaces on [0, 1]. Suppose that for each Sj there is a basis {-Bj,i, . . . ,Bj jTnj }. 
Suppose that there exist constants T\ and Ti such that for 9 = ($x, . . . , 9 m ) G 
R m J, 



(23) 
and 
(24) 



< T\ max \9i 



i=l 



> 



To 



\ XX 2 ' 

\ t=l 



where \\ ■ ||2 denotes the L2 norm with respect to the Lebesgue measure 
on [0,1]. Suppose that for j G J, @j C R mj and fgj is as defined in (21). 
Then Assumption 7 holds with 



(25) 



T 

A* = 9.64 -± + 0.06. 
3 T 2 



The proof is a straightforward modification of the proof for Lemma 1 of 
Yang and Barron (1998). 



4. Proofs. 
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4.1. Proof of Theorem 1. We prove Theorem 1 by giving bounds for 
U n and V n , respectively, and then combining the bounds to show that U n /V n 
converges to zero. For finding an upper bound for U n , we would like to use 
the following lemma, which is a modified version of Lemma by Yang and 
Barron (1998). 

Lemma 7. Suppose that Assumption 1 holds and 

iL>^i 0g f!^WO7 
rrij 1 — 47 V 7 

Then 



for some 9 G Qj, - £ log > -t4(/ , fe, 3 ) + ^ 

<15.1expf-i^^V 



J3* 



where P* is the outer measure for Pf . 

Proof. Suppose that Assumption 1 holds. We will show that for any 
r > and 5 < 0.056r, 

f"\A r \ m i 

(26) N(B dH>j (r),6,d j!00 )<\^-J , 

where Bd H j(r) is as defined on page 1557. Then the result in Lemma 7 
follows from Lemma in Yang and Barron (1998). 

Below is the proof of (26). Fix e > 0. Let 9* G @j be such that 



dn(fo, fo*,j) <er + inf d H (/ , fe,j)- 



Then for 9 G G,, 



1 £T 

dn{f ,fe,j) ^ -^{ d ^Uo,fe t ,j)+dn{fo,fe,j)) - y 

1 er 
> ^duifejJe^j) ~ y, 

so we have 

S dHj (r) = {0Ge j :d H (/ o ,/e J )<r} 

C{fiGe i :4(/ Sj ,/ 9 „ J )<(2 + £)r} 

= B dHj3 -(e,,(2 + e)r) ) 

where -Bd H j(0*, (2 + e)r) is as defined on page 1558. Take e = 1; then by 
Assumption 1, for any r > and 5 < 0.056r, (26) holds, so by Lemma in 
Yang and Barron (1998) the proof for Lemma 7 is complete. □ 
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Suppose Assumption 1 holds. Let Oj and rjj be as specified in (2) and (3) 
take £j = i]j + r yns 2 l /2. Then by Lemma 7 and we have 



ae -7n4/2 £ exp (_ LJl v \ < ae -7n4/2 



except on a set of probability no greater than 

^ / 1 - 4 7 A 
2^ 15.1 exp — 



= 15 , exp (_(lz±^) i:exp (_^ 

i 

< 15.1 exp (- (1 - 4 ; 6 hng " 
That is, an upper bound for U n is given by 

(1 - 4 7 ) 7 ns 2 
16 

To find a lower bound for V n , we will use Lemma 1 of Shen and Wasserman 
(2001). Let 

B D (r) = {g:D(f \\g) < r,V(f \\g) <r}, 
where V'(f\\g) = //(log(//g) — D{f\\g)) 2 d^L. Here is the lemma. 



(27) Pf o [U n > ae-^ 3 "/ 2 } < 15.1 exp 



Lemma 8. For t n >0 

f, 



V n <h(B D (t n ))e-^A<^ 



Suppose that Assumptions 2-5 hold. Let Bd jn j n (6,e n ) denote the d Jn -ball 
centered at 9 with radius e n in Qj n and define 

B D>jn (t n ) = {6 £ (-), :D(f ./;,,) < t n ,V(f \\f e , jn ) < U- 

We will first show that 

( 28 ) B d jn ,j n (Pri ori) C B DJn (t n ) 

for some t n oc e 2 and that 

(29) ^( B ^ n (^ >en ))>^ _^_ j *\ 
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Then we will deduce a lower bound for TT{B£,{t n )) based on (28) and (29) 
to apply Lemma 8. 

To prove (28), note that for 9 G Bd n j n ((3ni £ n)i by Assumptions 2 and 3 
we have 

V(f o \\f e , n )<2e 2 n + 2K' e 2 n 

and 

D(f \\fe, jn ) < K>>V(f \\fe Jn ) < 2K$(1 + K' Q )e 2 n . 

Therefore, (28) holds for t n = 2max(l,^)(l + K' )e 2 n d = K'e 2 n . 

To prove (29), note that by Assumption 4 there exist 6\,...,9 < i* G Qj n 
such that 

d* 

d*<(A%K A ) m ^ and [j B d ^ n (9 u e n ) D O 



3m 

i=l 



SO 



> 



where the last inequality follows from Assumption 5. 
It is clear that 

n(B D (t n )) > a jn TT jn (B DJn (t n )) 
(28) 

> a j n B djn j n {P n ,e n ) 

(29) / i \" 

> a 



Jn \A^ +b2 K 4 Kj ' 

so by Lemma 8, we have that except on a set of probability no greater than 
2/{nt n ), 

V n > \e- 2nt "a jA {B D>jn {tn)) 

e~ 2ntn ( ( l-4r/\ \( 1 



A bl+b2 K 4 K 5 



Jn 



(30) > | exp (-2nt n - Vjn (l + +b x + b 2 + {\og{K A K 5 ))_ 
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(4) a 

> — exp ( -2nt n - ne^ [ 1 + 



I-47 



+ h + b 2 + (}og(K 4 K 5 ))_ 



a 



-Knelt, 



where K = 2K' + 1 + (1 — 4 7 )/8 + by + b 2 + (\og{K A K 5 )) + . Here the third 
inequality follows from the fact that 



-^>— — log ^ - > max(l,logA,- 

raj 1 — 47 V 7 / 

for all j. 

Now we will bound U n /V n by combining (27) and (30). In (27) set s n - 
4:Ke 2 n /"/. Then 



K(B dll (s n ) c \Xi, . ..,X n 



Vn 



< 2exp(—Kne r 



except on a set of probability no greater than 

/ (1 - A-y)Knel \ 
15.1 exp [-■ i '4 a) + 



K'ne 2 , 



which converges to zero because ne 2 n — > 00 by Assumption 6. 



4.2. Proof of Lemma 1. We will verify Assumptions 1-6 for the spline 
example. To verify Assumption 1, we will apply Lemma 2. From page 143 
(4.80) in Schumaker (1981) 



i=l 



< max 1 9i 



Since mj and Bjj depend on (k,q) but not on L, we set I = (k,q), mi = rrij 
and B^i = Bj^. Then (8) holds with T\ = 1. To check (9), note that from 
(4.79) and (4.86) in Schumaker (1981), we have that for each i 6 {1, . . . , mi}, 

m < (2q + l)9^\y l+q - y^ 1 ' 2 U^mII^,^ , 

where y± , . . . , y 2q +k are as defined in Lemma 1 and L 2 [yi , yi+ q ] is the L 2 
metric with respect to the Lebesgue measure on [yi,yi +q \. Since yi +q — yi> 
1/(1 + k), 



in, 



in, 



Y, e f < ( 2 Q + 1) 2 9 2(9-1) (k + l)Y, Pi B l,i\\l 2 [ yi ,y i+q ] 



i=l 



i=l 



< (2q+l) 2 9 2 ^- 1) {k + q)q 



in, 



Oi^l, 



i=l 
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which implies that (9) holds with T2 = l/(y / g(2g + 1)9 9_1 ). By Lemma 2, 
Assumption 1 holds for Aj and rrij in (7). Also note that for the Cj specified 
in (7), Y,j e ~ Cj = e_2 /(l — e -1 ) 3 < 1 as required. 

To verify Assumption 2, we need to find j n and (3 n . Take j n = (k n , q*,L*), 
where {k n } is a sequence of positive integers such that 

C3n l/(l+2s) < K < C4n l/(l+2s) for all n 

for some constants C3 and C4, 9* = s + 1, and 

L* = min{L : L is a positive integer, L > 2 s + a g *Mo + Mo}, 

where M = max < r < s \\D r log /oIIl^ • To control the error max(D(f \\f f3n j n ), 
V(fo\\f/3 n ,j n )), we use the following fact. 

Fact 1 . For j such that q > s + 1 , there exists (3 E R mj such that 

1 



(31) 



|£> r (l0g/ - log f/3,j) 1 1 00 < OC q 

I -D s log //3,i 1 1 00 <a ? M . 



fc + l 



M for < r < s - 1, 



This fact follows from (6.50) in Schumaker (1981) and the result that for 
9 = (61,.. .,9 m .) G R m 3, 



log^ exp^-log/ G (x) + Y^8iBj,i(x) S j f (x) dx 



< 



log fo -^2 OiB j: , 



i=l 



From the fact, there exists (3 n £ K m ^ n such that 



log/ - log/^H^ < a g «M 



1 



Since £>(/o||//3 n ,j„) and V(f \\fp ntjn ) are bounded by || log/ G - logfp ndn H^, 
we have 

max(D(f \\f M J,V(f \\f Mn )) + 1 ^ 



<U q *M 



2 \ C J^L< Cin -2s/(l+2s) 

n 



for some constants c\ and C2. So Assumption (2) holds if (3 n £ ®j n and 
(32) 



4 = Cl n-^/( 1+2s ). 
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To verify that j3 n G @j n , we need to make sure {3' n t mjn = andmax < r < g _i \\D r log fp n j n < 
L* . For the first condition, f3' n t mjn = 0, we can assume it without loss of gen- 
erality, because \ogfp n j n does not change when f3 n is shifted by a constant. 
The second condition holds because of the second equation in (31). 

Now let us verify Assumptions 3-5 with dj n = dj n:OC , where dj n:O0 is as 
defined in Assumption 1. For the verification of Assumption 3, we will use 
the following fact. 



Fact 2. Suppose that 

(33) J f (log 2 < K d 2 jn (rj, 9) for all r), 9 G @ jn 
for some constant Kq and 

(34) sup Illog/o-log/fljJI^ <logK 3 

for some constant K$ . Then Assumption 3 holds with K = Kq and K ' = 
K 3 /2. 

The proof of the fact is a straightforward application of an equation in 
Lemma 1 by Barron and Sheu (1991), which gives 

(35) D(f \\f edn ) < y^°- l0 ^>^V(f \\fej n ) 

for all 9 G R m ^ . It is clear that (33) holds with K = l and that (34) holds 
with K 3 = e 2L * , so by Fact 2, Assumption 3 holds. 

For Assumption 4, by Theorems IV and XIV of Kolmogorov and Tikhomirov 
(1961), there exists an e n -net F £n for Qj n with respect to dj n so that 

1 \l/(9*-l) 



logcard(F e J < c q ^ L * 



1 X 1 / 8 

c q *,L* ( — < c q * )L *(k n + 1) < c q * )L *m jn . 



£,, 



Therefore, Assumption 4 holds with K^ = e c i*< L * and b\ = 0. 

We will check Assumption 5. For a positive integer to, for t = (ti, . . . , t m ) G 
R m , define 



I i 1 1 oo — max | ti | . 

l<i<m 



To bound irj n (B ( [ jn j n (P n ,E n )), we will show that 

(36) jfl G R m ^ : 9'l mjn = 0, \\9 - < c^-^jj^ C B djn j n {Pn,e n ) 
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where cq = min(l,y / 75j~/2(sup n n s /( 1+2s )(&; n + 1) s )). To prove (36), suppose 
that 9 £ R m in and 

0'l mjn =0 and \\e-(3 n \\^<ce(-^-j\ . 

We will show that 

(37) d jn (6,p n )<e n 
and 

(38) 0€Q jn . 
Inequality (37) holds since 

|| ]ogf 0Jn - log/^jJL < 2\\6 - Moc < 2c 6 c 5 n- s /( 1+2s ) < e n , 
where C5 = sup n (/c n + l^~ s n s ^ 1+2s \ Here the second inequality holds because 



\m-^(Pn)\ 



log / e (6-M'B e P' n B-4,(l3 n ) 



< \\9-Pn\ 



To prove (38), we need the following inequality: 

(39) \\D r (e'B-[3'B)\\ Loo <2 r (k + iy\\e-p\\ 00 forallO<r< S , 

which is deduced from (4.54) in Schumaker (1981). Now note that for < 

r < s, 

ll^iog/^-Jloo = Wo'bu 

< \\D r (e'B - /3^)|U + \\D r ((3' n B - log/^Hoo 
+ || J D r log/ || 00 



(39),(31) 



< 2 r (k n + l) r \\6-p n \\ oo + a q *M 
1 



1 



< 



(2 r + a q *M ) + M <L*, 



+ M 



for r = 0, 



l °gfe,j n \\oc ^ II l °gfd,jn ~ lo g//3n,jJloo + II lo S fa ,3n - lo g/o|loo + II lo S/ol 

< 2\\6 - Moo + I) log f Pn>jn - log/oil^ + M 

1 ^ s 



< 



{2 + a q *M Q )+M <L* 
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and for r = s, 

\\D s logf 9 ,J Lx = \\D s 6'B\\ Loo 

<\\D s (e'B-p> n B)\\ Loo + \\D s p> n B\\ Loo 

(39),(31) 

< 2 s + a q *M <L*. 

Therefore, 9 £ @k n ,q*,L*, so (38) and (36) hold. To bound nj n (B djn j n (9i,e n )) 
in Assumption 5, note that for all e > and for all j, 

(40) {9 e Qj : || logfej - logfe 1 J 00 < e} C {9 € 6, : ||0 - 9^ < 2/?*,e}, 

where /?*» is some positive constant. This result follows from Lemma 4.3 
of Ghosal, Ghosh and van der Vaart (2000), which implies that for all 9, 
9 1 £ R m ^ , 

\\9 — #i||oo < || l°S/(fl-ei)j„ || co times some constant depending on q* , 
and from the fact that 

II lo g/(0-(?i),j„ - ( lo g/ei,j„ - tog/fyJIL 

= |V(fl-fli)-(VW-^ 1 ))| 



log I exp^'B - </>(0) - (0iB - V(#i))) 



< ll lo g/9i,in -iog/e^JL- 
Then by (40) and by (36) we have 

ir jn (B djn , jn (9 u e n )) ~ 



> 



P**e n (l + (c 4 ^I/e n y/ s y 



For n such that < e n < 1, 



c 6 \ 



^(1 + (C4^I) 1/S ) S . 

Without loss of generality, we can assume that (3** > 1, so it is clear that 
Assumption 5 holds with K 5 = + {c iy /c{) l / s ) s /c 6 and 6 2 = 0. 

For Assumption 6, it should be clear that it holds with the e n specified 
in (32). Now by Theorem 1, the result in Lemma 1 holds. 



20 



T.-M. HUANG 



4.3. Proof of Lemma 3. We will verify Assumptions 1-6 for the Haar ba- 
sis example. To verify Assumption 1, we will apply Lemma 4. First, by (3.7) 
in Barron, Birge and Massart (1999), (14) holds for K\ = 2^ l+1 ^ 2 . Second, 
for all j and 9 £ Gj, \</>(0)\ = | logjV' B | < ||0'-B||oo, so (15) holds. Therefore, 
by Lemma 4, Assumption 1 holds for Aj and m,j in (13). Note that for the 
Cj specified in (13), J2j e ~ Cj < 1 as required. 

To verify Assumption 2, we will first choose j n and j3 n , and then show 
that 

|| log/ - log f Mn || 2 < C haJo>Ho 

(41) 

|| log f a - log ff3 nJn < 2c 2 ,/ 

for some constants c\ ta ,f ,Ho and C2j a and that n £ Jn . Then we will 
take e n according to an upper bound for the left-hand side of (31) so that 
Assumption 2 holds. We will see that e n converges to zero at the rate 
(logn) 1//2 n _Q? /( 1+2a ) as required. 

j n and fi n are defined as follows. Let {l n } be a sequence of integers such 
that 

fc3n l/(l+2a)< 2 I n +l< A:4n l/(l+2«) j 

where and fej are positive constants. Let 



def 

i=l 

be the L% projection of log f Q to the space spanned by 1 and Bi n:i : i = 1, . . . , 
m u ~ L Let M o = II log /o|| oo and c 2 j a = sup n || log/ G - (3 - /%#|U- {c 2 j 
is finite since (3q + (3' n B converges to log/ Q uniformly.) Define 

L* = min{L : L is a positive integer and L > 2c 2 f + 3Afo}. 

Set j n = (l n ,L*). 

To prove (41), we will bound log/ D — 0o — 0' n B and /?o + ip(j3n)> respec- 
tively By (12) we have 



log fo-fo- P'nBh < " = == < 



To bound 0o + let A = f( e Po+/3' n B-\ogf _ ^ and & = y log f Q _ 0Q 

P'BWoo. Then 



\P0 + HM\ 



log / e A+/3^-log/ o/o 

log(l + A)| 



< max f A 
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r+A 

< |A|e b+Mo (since e~ b - M ° < 1 + A < e b+M °) 

< e f+2A/o ^ + l e 6|| lQg/o _ ^ _ || lQg/o _p Q _ ^ 5 || 2) 

where the last inequality follows from the Cauchy-Schwarz inequality and 
(3.3) in Barron and Sheu (1991), which says that 

2 2 

— e~ max (-^°) < e z - 1 - z < ^ e max(z ' 0) for all z. 
2 ~ ~ 2 

Therefore, the first inequality in (41) holds. The second inequality in (41) 
also holds since 

|| log/ - A) - (3' n B\\oo < || log/o -fa- /M|oo + |/3 + 



C 2,/o + 



log y e /3o+/^-iog/o /c 



<2c 2l/o . 



Now we have proved (41), which implies that Hlog/g j ||oo < L*, so /3 n G 

©in- 

The L2 bound in (41) gives a bound for the error max.(D(f \\fp n _j n ), 
V(fo\\fp n ,j n )) since 

2 



(42) V(f \\f Pn , jn ) = |/ (log-^-) < e ||Iog/o|| -||log/ -log/^J|^ 
and by (35) and (41), 

(43) IHL /,..,.J < \e 2c ^o V{fo \\ fpn . n ). 

By (41)-(43) and the definition of r]j n , we can find two constants k\ and ki 
which depend only on a, f Q and Hq such that 

max( J D(/ ||/^ n ),y(/ ||/^ i J) + % < kl (—) 2a + A*™*" l0gmj " 



n V m j n / n 
Since l n is chosen such that k^n 1 ' ( 1+2a! ) < < fe^n 1 ' ( 1+2a ) , we have 



max 



(D(f \\fp ntjn ),V(f \\f Pntjn )) + ^ 



n 



n 



H f fe 5 n- 2a /( 1+2a ) lof 



i^n /v 'logn. 
Hence, Assumption 2 holds with e 2 = /c5n _2Q//( ' 1+2Q ^ logn. 



22 T.-M. HUANG 

To verify Assumption 3, for all positive integers m and for all t = (t% , . . . , t m ) 6 
R m define 



\ 



m 

'i ■ 



J2n 



Let dj n = || • || on R rn ^ n 1 . We will verify Assumption 3 using Fact 2. For 
r/, 9 G 0j n , since 

V(r ? )-^)=log/e^)' B / r)Jn 

<\og f {I + {9 -rjyBe^'^f^ 



4L* 



< log(l + ||0 — r]\\e 
<e AL *\\9-r]\\, 



||log/^„ - logf 0jj Jl = (V(r?) - V(^)) 2 + \\V 



and 



| ./;.('l,g^) 2 < el'^-ll-IIIog/,^ -logfe, jn f 2 

= e Mo \\logf ri j n -logf ejn \\l, 

(33) holds with K = e M ° (1 + e 8L * ) and clearly (34) holds with K 3 = e Mo+2L * . 
Therefore, by Fact 2, Assumption 3 holds. 
For checking Assumption 4, note that 

@ jn c{9eR m ^- 1 :\\9\\ 00 <L*}, 

which implies that for every e > 0, there exists an e-net F £ for Qj n with 
respect to || • ||oo so that 

card(F e J< + — 

By the fact that \\9\\ < V rrij n — l||#||oo f° r all 9 G Jn , there exists an e n -net 
F £n for @j n with respect to dj n such that 

card(F E J< 1 + 
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Since 

1 + (2L*y/m jn - T)/e n ^ (1 + 2L* v / fcIA 5 )™ L5Q/(1+2a) 



< 



A 3a ~ U.5a n 1.5a/(l+2a) 

Assumption 4 holds with K 4 = (1 + 2L*^//c 5 )/(^- 5a ) and &i = 3a. 
For Assumption 5, to bound ^j n {B^. j„(Ai> £ n.))i we wm show that 



"^IF-^nlloo-- 



(44) ( G i?^"- 1 : ||0 - /3„||oo < ^ = \ C B djnJn (f3 n ,e r 



for n such that e n < Mq. For G i?" 1 -?™ 1 such that \\6 — A.||oc < £n/ ( m j„\ /m j n ~ 1 ); 



||0 - Ail | < Vm Jn - 1||0 - AIU < — ^ < £n, 
so it suffices to show that 9 G @j n - For n such that e n < Mo, 

Halloo < \\0'B - /^Slloo + || A + " log/o||oo + IAI + II log /olloc 

< m jn \\e - M + 2c 2)/o Mo + 2M 

<e n + 2c 2i/o M + 2M 

<2c 2i/o M + 3M <L*, 

so 6 G Bj n and (44) holds. To bound nj n (Bd jn ,j n (9x,E n )) m Assumption 5, 
note that for all e > and for all j, 

(45) {0 G a, : ||e - 6»i || < e} C {9 G 0, : ||0 - 9^ < e}. 

By (44) and (45) we have 

Vj n ( B d Jn ,j n (0l,£n)) 



Ti„ ,i n ( A , ) ) V e„ / (m in - 1) 

Since 



4, / "uv^:) 

Assumption 5 holds with 6 2 = 3 and = 1 . 

It is clear that Assumption 6 holds with the above e n , which tends to 
zero at the rate (logn) 1 ^ 2 n~ a ^ 1+2a \ By Theorem 1, the result in Lemma 3 
holds. 



4.4. Proof of Theorem 2. We prove Theorem 2 by giving bounds for 
U n and V n , and then combining the bounds to show that U n /V n converges 
to zero. 

To bound U n , we will use Lemma 9, which is the regression version of 
Lemma 7. 
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Lemma 9. Suppose that Assumption 7 holds and 7 G (0,0.25) is defined 
so that 

0.13 



0.0056 



7 



C2,c ,M\/ Cl > M > a ' V 1 -^T' 



TTien /or all j and for all snc/i £/iai 



6 



> 



■log(1072.5Aj), 



Jo 



mj ci jA f )0 -(l - 47) 

n 1 n 

~ £(y, - / (x 4 )) 2 - - £(y, - W ! 



i=l 



n f 







>-7ll/o-/ ej |li 2(Atx) + ^ + 0.0224 



1 n 
n ^— f 



-in -1 n 

J- x — ^ -*- x — ^ o o 

/or some 6* € 0j and — |ej| < cq, — < c 

1=1 1=1 



< 15.1 exp 



ci,m, ct (1 -47)fj 



where 



1 - exp(-M 2 /(2o- 2 )) l 



2M 2 



— ) and c 2;C0i m = 2(c + 2M). 



The proof of Lemma 9 is long and is deferred to Section 4.4.1. 

Now suppose that Assumption 7 holds. Take cq = 2a and define 7 as in 
Lemma 9. Let Cj > be such that J2j e ~ Cj — 1 an d define n^ and a, as 
(18) and (17), respectively. We will apply Lemma 9 to prove (46), which 
gives an upper bound for U n , 



(46) P fo 
where 



U n < a exp 



0.0056Z 2 7ns 2 



Aa 2 



> 1 - (pi + P2+P3), 



Z,, 



1 

^5>~*(0,1), 



Pl = p 



1 " 

_ E N > c o 

n r— ' 

i=i 



P2 = P 



and 



P3 = 15.1 exp 



E e ? >c o 

i=l 

ci,Af,o-(l ~ 47)7ns 2 
32(0.5 + 0.0056a) 
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To prove (46), take 



& = Vj + 



Since U n is 

E a ; 



4(0.5 + 0.0056a) 

exp(l/(2a 2 )Er=i(^-/o(^)) 2 ) 
(B L2( , x) ,e^)r exp(l/(2o-2)Er=i(^ - fe,j( X i)) 2 

Lemma 9 gives 

U n < £ aj exp ^ i^-insl + + 0.0224 

^ / 7ns 2 6 0.0112 

y aj exp 1 



E< 



+ ^2 + ~ l^nlvC? 



2a 2 2a 2 

,2 



/ 7ns ^ £. 0.0056. A 



■ exp 



0.0056Z 2 7 ras 2 
cr 4cr 2 



E 



aj exp 



0.5 + 0.0056a 



■Vj 



a exp 



< a exp 



0.0056Z 2 



0" 



4a 2 



E< 



0.0056Z 2 7ns 2 



a 4cr 2 
except on a set of probability no greater than 



_ E N > c ° 



n . 



+ P 



-E e ? >c o 



n . 
i=i 



+ 15.1 exp 



Cl,A/, CT (l - 47)^ 



Note that 

E ex P 



ci,m )0 -(1 - 47)^ 



exp 



< 



exp 



Cl,M,a{ 1 ~ ^l)l ns n 

' 32(0.5 + 0.0056cj) 

C\,M,a{ 1 - ^l)l ns n 

' 32(0.5 + 0.0056(7 ) 



E ex p 



ci,a/, ct (1 - 47)77j 
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so now we have the following bound for U n : 

'0.0056Z 2 7n4 N1 



P 



fo 



U n <a exp 



cr 



4a 2 



>l-(pi +P2+P3)- 



The process of deriving a bound for V n is the same as in Section 4.1 except 
for the following changes: 

1. Replace f Q by pf a , fej n by Pf e n and Assumptions 2 and 3 by Assump- 
tions 8 and 9. 

2. The proof of (28) is modified as follows. First, note that in our regression 
setting, for all 9 £ Qj and for all j , 

(47) D(P,JPf u )= V °~^f M 
and 

t 7" / n \ ll/o _ /fj'llla^ x ) If. 4 

^(p/Jp/ 9j ) = - 2 — + 4^4 / (fo ~ fe,j) 

(48) f i ^ 2 

^2 



< 



|2 



By (47), (48) and (20), for 9 G B d . ndn (0 n ,e n ), we have 

\\f/3,j n ~ fe,ju\\ 2 L 2 (fj, x ) 



D(pf \\pf etjn ) < D(p fo \\p Un: . n ) + 



2a 2 



K'f 2 
< 2 , -"-0 £ n 

- n+ 2a 2 

and 

/ 2M 2 \ 
^(p/J|p/ 9jB ) < (2 + —)D(p fo \\p ft 

Therefore, (28) holds for 



3. The process of deriving a lower bound for V n in (30) is modified as follows: 

Vn > \e- 2ntl a Jn K jn {B D M) 

ae~ 2nt * ( ( 1 0.0056 \ \/ 1 \ mjn 



a ( n o / 1 0.0056 
- 2 6XP V j "V 1+ 2^2 + 



a 
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(49) 



+ c 1 (b 1 +b 2 + (log(K 4 K 5 )),) 



o 



1 0.0056 



(19) 

> g eXP ( - 2nt n- n£ n\ 1 + ^2 + 



+ c l {b l + b 2 + (\og{K A K b )) + ) 



. a _ 
~ 2 6 



Knel 



where c\ = c\ % m,o and 

1 0.005C 
Here we have used the fact that 



K = 2K' + 1 + ^ + — ^— - + Cl (6! + b 2 + (log(K 4 K 5 )). 



dm 4 / 1072.5 Ay l-4 7 \ . , 

— > i ^log ^ > max 1, log Aj) 

rrij 1 — 47 V 7 / 

for all j. 

Now we will bound U n /V n by combining (46) and (50). In (46), set 

8a 2 Kel 



1 



Then 



Un 



n(B L2 ^ x )(s n ) c \X 1 , . . . ,X n ) = — 

^ / 0.0056^ \ 

< 2exp( -J exp(-Kne 

except on a set of probability no greater than 



P1+P2 + 15.1 exp 



where 



ci,M,q(l-47)8o- 2 ifne;; 
32(0.5 + 0.0056a) 



1 

£ £i ~jV(0,l), 



+ 



K'ne 2 n ' 



Pi = P 



ri 



and 



P2 = ^P 
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Note that Co = 2a > max(E\ei\,Eef), so p\ + P2 — > as n — > oo. Since 
2 e 0.0056Z n /o- con verges in distribution and e _i ^ n£n converges to zero by As- 
sumption 6, we have that 2e om56Zn ^ a e~ Kn6n converges to zero in probabil- 
ity. Therefore, TT(B L2 ^ x )(s n ) c \Xi, . . . ,X n ) converges to zero in probability 
as stated in Theorem 2. 

4.4.1. An exponentional inequality. We claim that to prove Lemma 9, it 
suffices to prove Lemma 10, which has a slightly different assumption. 

Assumption 10. For some j G J, for 9 G @j, \\fe,j\\oo < M, and there 
exist constants A > 0, m > 1 and < p < A such that for any r > 0, 5 < pr, 
G Qj , the (5-covering number 



iV(B i2(Mx))0 .(r),<5,d i)OO )<^ T 

where B^^q.^) ={fl£ Qj-\\fo ~ fejWl^fpx) ^ r i and for V, G ©j, 
dj,oo{r}-,0) = ||/„j - /e ,j 1 1 oo- 



lemma 10. Suppose that Assumption 10 ZioWs wrai/i 

0.13 7 



C2,co,M^/ClJU^ : V 71 _4 7 



TTien /or £ suc/i i/ioi 



where 



> 



log 15.4c2 iC0 ,m v / ci,m, ( tA 



a/T^4t" 



01^(1-47) V 

1 n 

£(Y, - fo{Xi)) 2 - - feA*i)) 2 



I > 



n 



i=l 



> ~l\\fo- fe,j\\l 2 ^ x ) 



i=i 

+ - + 4 

n 



n 



i=i 



/or some G 8j and — |ej| < cq, — 2^£j < c 



i=l 



i=l 



< 15.1 exp 



ci,M, CT (l-4 7 )n 
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ci,M,o- = mm 



( 



1 - exp(-M 2 /(2(j 2 )) 1 
2M 2 ' 2a 2 




and c 2 ,c ,M = 2(c + 2M). 



To see that the claim is true, note that in the proof for (26), dn can be 
replaced by I^O-Of)- Therefore, if Assumption 7 holds, then for all j G J, 
Assumption 10 holds with A = 3Aj and p = 0.0056. Suppose that Lemma 10 
is true. Then Lemma 9 follows by setting p = 0.0056 and choosing 7 such 
that 



Proof of Lemma 10. We follow the proof of Lemma in Yang and 
Barron (1998). First, divide the space 0, into rings 



Then we will put all the bounds for qi together to complete the proof. So 
let us focus on one 0^, first. Let {5k}kLo be a sequence decreasing to zero 
with 5q < min(/9ro,(5) and define 5k = 5k for k > 1 and <5o = 5q/2. Then by 
assumption we can find a sequence of nets Fq , Fi , . . . , where each Fk is 
a 5k net in 0, j satisfying the cardinal number constraint in Assumption 10. 
In other words, for each k, there exists a mapping : Qj t i — > Fk such that 



Instead of applying the chaining argument using the nets Fk, we will modify 
the net Fq first and then apply the chaining argument using the nets Fk, 
where Fk = Fk for k > 1 and Fq is the modified Fq. Now modify the net Fq 
in the following way: Consider a positive number e. For each 0q in Fq, find 
6q in 



0.13 7 



P = 



C2,co,My/Cl,M,o- V 1 - 4 7' 




Wff k (0),j ~ fo,j\\°° - 5 k for all 6 <E and 




f o - 1 ^o) = {^G0i,i:roW = ^o} 
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6»Sr (0 O ) 



+ e. 



Define t(6q) = 6q, and Fq = {r(#o) : 9q E Fo}- Define tq = t(tq) and tj, = 
for k > 1. Then by the triangle inequality, ||/ ro (0),j — fe,j\\oo < <^0j so Fo is 
a 5o net and for each k, F& is a net. Now we can start the chaining 
argument. For each 6 £ Qj,i, define 



-in -i n 

^ = - Em - /«) 2 - - Em - fMe),A x i)f 



i=l 



n 



i=l 



and 



i=l 



n 



i=l 



-, n -. n 

h = -Em - u<wW) J - - Em - f^mf 

for > 1. Then 

-in 1 n 

- Em - /o(^)) 2 - - Em - torn? = fo + E **■ 



i=l 



n 



i=l 



fe=l 



Now, instead of giving bounds for If. — Elk a s in Yang and Barron (1998), 
we will give bounds for — E e lk, where 

2 n f 

E ^ k = ~H £l J (fv.(0)j ~ fr k -i(9),j) d^x 
i=i 

+ ll/o ~ /r /t _ 1 (0,i)lli 2 ( M j f ) - ll/o - fr k (e,j)\\ 2 L 2 (a x ) 

is the conditional expectation of given e±,. . . ,e n for k > 1 . Note that 

oo /j n \ 

E ^fc = 2 ( - E £ * J / C/»j - from^vx 

k=l \ i=l / 

+ ll/o - /r o (0,i)lll, 2 ( Mx ) - ll/o - /ejlll 2 („ x ) 



< 2 



<4 



E e « 

2=1 
1 - 

-E e * 



(/ej - fr (e),j)^x + e 



£o + e < 4 



1 n 

i=i 



5 + 



so 



q l < P*(B DB) 

< P* (Uo > -^irf + | - e for some G %i J n 5 
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+ ^ P*({l k - E £ l k > r] k for some 9 £ 9^} n B) 
k=l 

fe=i 

if Efcli % < 7»f , where 

= |io + jtih ~ E e l k ) >-s- 7||/ - /ejllla^) + | for some 9 £ 9^ 
and 



A; = l 



1 1 

5= -£M^ c o>-E £ B c o 

n r— f n r— f 

k i=i i=i j 



□ 



To bound we will use the following inequality of Chernoff (1952): 



Fact 3. Suppose that Xi are i.i.d. from a distribution with density g% 
with respect to measure /U and g\ is a density with respect to the same 
measure. Then 



1" gi{Xi) 
~ lo § — tvm - 1 



n 



< exp( -^(d 2 i (g 1 ,g 2 ) +t) 



Since 



i=l 



p io {Xi) 



Fact 3 implies that for a tq(9), 



(50) 



> t] < exp( -_(4(p /To(e)j , P/o ) +t/(2^)) ). 



To replace the Hellinger distance d^ipf^^ pPf a ) with the L 2 distance \\f T0 {e),j 
fo\\L 2 (ti x ) in (50), note that 



d H(P/ ToWj >P/o) = 2 / 1-exp 



(fr (e),j(x) - f {x)f 



8a 2 



d(j,(x) 



(51) 



> 

def 



l-exp(-M 2 /(2a 2 )) 
2M 1 



(fr (9),j(x) ~ f {x)fdn{x) 



C0,M,°\\fM6),j - /olli 2 ( MX )- 
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Here the equality follows from direct calculation and the inequality follows 
from the fact that (1 — e~ x )/x is decreasing with x on (0,oo) and that 
||/r (6»),jl|oo, ll/olloo < M. Now by (50) and (51), we have 

P[k >t}< exp(-^(co, M sr\\f M o)j - fo\\l 2 ^ x) +t/{2a 2 ))^ 



< 



cxp 



Cl,A/, g ra , . 1|2 

2 Ul/r (e)j - Jo\\L 2 (px) + t > 

2,1 



where ci^a = min(co,M,o-> l/(2o" )). Set i = —2jrf + ^ — e. Then for a to(0), 



P 

Therefore 
(52) 



n 



< exp 



ri_! - 2 7 t- + - - e 
n 



g^ 1 ' < card(Fo) exp 
< card(F ) exp 



Cl,M,a^ / 2 



- 2 7 rf + 1 - e 
n 



(i + l)(l-4 7 ) 



n 



where the last inequality was verified in Yang and Barron (1998), from the 
end of page 111 to the beginning of page 112. 

(2) 

To bound q ik , we will use Hoeffding's inequality. 

Fact 4. Suppose that {Yj}™ =1 are independent with mean zero and that 
a i < Yi < 6j for all i. Then for r] > 0, 

< exp 



For a pair (r k -i(0), r k (9)), 



-2r/ 2 



■\2 



\(Xi ~ fr k . m ,A x i)) 2 ~ (Yi ~ U k{ e)A x i)f\ 
<2|/ Tfe _ l(0) , i (x i )-/ 7 . fcW , i (x i )| 

/rt-ilSJjt 1 ') + fr k (e),ji X i 



Si + / pQ 

< 2(tf fc _i + <y fc )(| ei | + 2M) < 4(| £i | + 2M)5 fc _i. 
By Hoeffding's inequality, the conditional probability 

— 2n 2 r) 2 



P[l k - E £ l k > T}\ei, ...,s n ]< exp 

< exp 



E2= 1 64(|e i | + 2M)^_ i 

—2m/ 2 
64(c + 2Mf5 2 k _ l 
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if Yh=i \ £ i\/ n < c o an d Ya=i £ i/ n < c o- Integrating the conditional probabil- 
ity over set B, we have 

P({l k - E A > ,} n B) < exp ^^^J . 

Therefore, 

(53) g f)<ca r d(F t _ 1 )card(F t )exp( 64fa ; 2 2 "^ Li ). 

Now combine (52) and (53) and let e — ► 0. Then we have 

q t < card(Fo) exp(-^-^(i + 1)(1 - 4 7 )1 
V 2 n 



+ card(F fe „i) card(F fe ) exp 
fc=l 



-2n?7 2 



A' 



,64(c + 2M) 2 ^_ 1 



Ar* "\ "Y A ri \ m ___ ( -2nrf k 



{\Sk-J \S k J ^\M{c + 2M)Hl_ 1 



Now choose 5$, 6f. so that 



log 



<Q 



ci,M, CT (A: + l)(l-47)e 



and rjk such that 
2nr% 



64(c + 2M) 2 5 2 k _ 1 

. (2fc + l)ci,M, g (l-47)g (i + l)fcci, M , g (l-47)| 

= vm log 2 H ■ — ■ 1 ■ — ■ . 

4 8 

Now the bound for qi becomes 

q { <2 l expl Jexpl ^-(i + 1)(1 - 47)^ 



+ H ex P 

fc=i 



(i + l)ci,M,*k(l - 4 7 )£ 



<exp( — log2 1 

/ (i + l) CMfitJ (l-4 7 )^ 
+ ex P z 
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x ^1 — exp 
Note that by assumption 
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(i + l)ci,M, g (l-4 7 )a 



-1 



m 2A ci iM)CT (l - 4 7 )£ 

— log — < — ■ — ■ , 

2 Po ~ 8 



where 



Po 
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15.4c 2 , C0 ,mVCi,m,<t(1 - 47) 



Since po < p < A, we have 
(54) 



so 



and 



log2 m ,rn 2A c 1)M ,a(l - 4 7 )^ 
< — log 2 < — log — < — ! — '■ , 

2-2 & -2 & p ~ 8 



qi < exp 



(i + l)ci, M ,a(l-47)e 



x ^1 + ^1 — exp 
V2 



Cl,M,a(l - 47X 



-1 N 



< 1 + 



>/2- 1 



exp 



(i + l)ci, M , CT (l-4 7 )a 



f * 



n 1 n 

-£(y, - / D pQ)) 2 - -E(^ - toW) 

i=l i=l 



>-7ll/ -/^lll 2(MJC ) + ^ + 4 
for some E © ? - and — \si\ < Co, — e? < Cn 

OO 



i=l 



i=l 



i=0 



< 1 + 



V2 



y/2-1 
x I 1 — exp 



exp 



Cl,M,aO- ~ 4 7)£ 



(54) 

< 15.1 exp 



Cl,M,a(l-47K 



Cl,Af,a(l - 4 7 )£\ 
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It remains to check that {Sk}^L is a decreasing sequence 



(55) E % ^ 

A; = l 



and 

(56) S < mm(r p,5), 

as claimed in the beginning of the proof. By (54), 5o/S\ > 1, so {5k}^ = Q 
is decreasing by construction. To verify (55), let c 2 = 2(co + 2M) and c\ = 
ci,M, CT - Then 



/e cxp / ci(l-47)e \ / im81og2 | (z + 7) Cl (l-4 7 )e 



n V 4m J \l n n 

( < 4) 2c 2 ^v^4t) expf- Cl(1 7 ^ v / 3i + 9, 
n V 4m / 



and for k > 2, 



^/!-(- sM ^ M ) 



/fm81og2 2(2fc + l)ci(l-47)^ (i + l)fcci(l - 4 7 )£ 

x \ j | 

y n n n 

l i 4) A I ci*(i-47)r 



n \ 4m / 



x v /2(2A; + l) + (i + l)(fc + 2) 

< ^1 Vd(l - 4t) exp(- Clfc(1 4 ^ 47)e ) + 5)(fc + 2) 

< c^iv^l^Oexpf- 01 ^ 1 ~ 4 ^ )Vi + 5. 

n V 8m / 



Therefore, 



f]vk < c 2 A^v / ci(l-4 7 )expf- Cl(1 /| ^- )Vi + 5 
^ n \ Am J 



2\/3 + 



1 - exp(-ci(l - 47)£/(8m)) 



< c 2 ^iv^(1^4 7 )expf- Cl(1 - 47) ^ V52- 
n \ 4m / 
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2 ^ + l-exp(- Cl (l-4 7 )£/(8m))) 



1 - exp(-ci(l - 47)£/(8m)) 
< l5Ac 2 ^A^ H e xp(- Cl(1 ~ 47) ^ 7 2^ 



VI- 


47 


7 




47 



4m / n 

15.4c 2 ^ exp ^ — J 7 r, . 

To make (55) hold, it is sufficient to require that 

m ci(l — 47) V 7 / 

as in the assumption. Now it remains to verify (56). (56) follows from the 
fact that 

5o = M Vn eXP l Im~ J " W« 

and that 

The proof for Lemma 10 is complete. □ 

4.5. Proof of Lemma 5. We will prove Lemma 5 by verifying the assump- 
tions in Theorem 2. To verify Assumption 7, we will apply Lemma 6. Follow- 
ing the same arguments in the verification of Assumption 1 of Lemma 1 in 
Section 4.2, we have that (8) and (9) hold with T\ = 1 and T2 = l/(^/q(2q + 
1)9 9_1 ). By Lemma 6, Assumption 7 holds for Aj and m,- in (22). Note that 
for the Cj specified in (22), J2j e ~ Cj = e ~ 2 / (1 — e -1 ) 3 < 1 as required. 

To verify Assumption 8, we choose j n and /3 n as in the verification for 
Assumption 2 in the proof of Lemma 1 except for the following changes: 

1. Fact 1 is replaced by Fact 5. 



FACT 5. For j such that q > s + 1, there exists j3 G i? mj such that 

• 1 s 



\D T (fo - //3,i)||oo < a q [ ) M for <r < s - 1, 



(57) 

where M = max < r<s \\D r f \\ Loo . 
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The above fact follows from (6.50) in Schumaker (1981). 
2. (3 n G R m i n is chosen so that 




2s/(l+2s) 



To verify that (3 n G 6 Jn , we need to make sure maxo< r < g _i \\D r fp n ,j n Wl^ < 
L* and ||//3 n ,j n ||oo < Af. The first condition follows from the second equation 
in (57). The second condition holds for large n because of (58) and the fact 
that ||/ || < M. Therefore, Assumption 8 holds for large n for the e n in (59). 

Assumption 9 holds with dj n (rj,0) = \\f v ,j n - fe,j n \\oo for all r], 9 G Q jn 
since (20) holds with K' Q = 1. 

For Assumption 4, the verification is the same as the one for Assumption 4 
in the proof of Lemma 1. 

To verify Assumption 5, we need to bound ^j n {Bd jn j n ((3 n ,e n )) by showing 
that 



where cq = min(l, v / cT/(sup n n S// ( 1+2s )(£; ri + 1) s )). For 9 G R m ^ such that 
Ailloo < c 6 (l/(/c n + l)) s , we will prove (37) and (38). The inequality (37) 
follows from the same arguments as in the verification for (37) in the proof 
of Lemma 1, except that ||log/ e j n - log fp n j n is replaced by \\fe,j n - 
fPn,,j n \\oa an d the factor 2 is dropped. To prove (38), note that for < r < s, 



where the results follow from the same arguments for the verification of (38) 
in the proof of Lemma 1 except that log fgj n is replaced by foj n , log f Q is 
replaced by f D and the case r = is combined with the case < r < s here. 
Also, 



(60) 




D r fe,, 



,3n I loo 



<L* and \\D S f e .- |L <L 



Wfo* 



e'B 



'OO 




o||oo 



o |oo 
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for large n since ||/ ||oo 

< M. Therefore, 9 € @k n , q *,L* and (60) holds. 
To bound 7Tj n (#<!,„ ,j n (#i,£n)) in Assumption 5, note that by Lemma 4.3 
of Ghosal, Ghosh and van der Vaart (2000), there exists (3** > 1 such that 
for all e > and for all j, 

(61) {6 G 9, : \\fgj - faj^ <e}c{9e @j : \\9 - < /?*,£>. 

Then by (61) and (60), following the arguments after the verification of 
(40) in the proof of Lemma 1, Assumption 5 holds with K§ = f3**(l + 

(c4^T) 1/s )Vc 6 and b 2 = 0. 

For Assumption 6, it should be clear that it holds with the e n specified 
in (59). Apply Theorem 2 and we have the result in Lemma 5. 
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